perm filename EVALU[DIS,DBL]4 blob sn#214758 filedate 1976-05-10 generic text, type C, neo UTF8
COMMENT ⊗   VALID 00020 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00003 00002	.NSECP(Evaluating AM)
C00006 00003	.SSEC(Judging Performance)
C00012 00004	. SSSEC(AM's Ultimate Discoveries)
C00017 00005	. SSSEC(The Magnitude of AM's Progress)
C00021 00006	. SSSEC(The Quality of AM's Route)
C00034 00007	. SSSEC(The Character of the User-System Interactions)
C00041 00008	. SSSEC(AM's Intuitive Powers)
C00046 00009	. SSSEC(Experiments on AM)
C00050 00010	. SSSEC(How to Perform Experiments on AM)
C00055 00011	. SSSEC(Future Implications of this Project)
C00065 00012	. SSSEC(Comparison to Other Systems)
C00079 00013	.SSEC(Capabilities and Limitations of AM)
C00080 00014	. SSSEC(Current Abilities)
C00088 00015	. SSSEC(Current Limitations)
C00094 00016	. SSSEC(Limitations of the Agenda scheme)
C00104 00017	. SSSEC(Choice of Domain)
C00112 00018	. SSSEC(Limitations of the Model of Math Research)
C00118 00019	. SSSEC(Ultimate powers and weaknesses)
C00122 00020	.SSEC(Final Conclusions)
C00124 ENDMK
C⊗;
.NSECP(Evaluating AM)

.TURN ON "{}"

This chapter contains discussions "meta" to AM itself.

First comes an essay about judging the performance of a system like AM.
This is a very hard task, since AM has no "goal". Even using current mathematical
standards, should AM be judged on what it produced, or the quality of the
path which led to those resuls, or the difference between what it started with
and what it finally derived?

Section {SECNUM}.2 deals with the capabilities and limitations of AM.

.B48


⊗8# ⊗* What concepts can be elicited from AM now? 
With a little tuning/tiny additions?

⊗8# ⊗* What are some notable omissions in AM's behavior? Can the user elicit these?

⊗8# ⊗*
   What could proabably be done within a couple months of modifications?

⊗8# ⊗*
   Aside from a total change of domain, what kinds of activities does AM lack
   (e.g., proof capabilitites)? Are any discoveries (e.g., analytic function theory)
   clearly beyond its design limitations?

.E

MAYBE:
Finally, all the conclusions will be gathered together, and a short summary of 
this project will be tolerated.

.TURN OFF "{}"

.SSEC(Judging Performance)

One may view AM's activity  as a progression from an initial core of knowledge
to a more sophisticated "final"$$ As has been stressed before, AM has no
fixed goal, no "final" state. For practical purposes, however, the totality of
explorations by AM is about the same as the "best run so far"; either of these can be
thought of as defining what is meant by the "final" state of knowledge. $
body of concepts and their facets.
Then each of the following is a reasonable way to measure success, to "judge" AM:

.BN SKIP 1

λλ By AM's ultimate achievements. Examine the list of 
concepts and methods AM developed.
Did AM ever discover anything interesting yet unknown to the user?$$ 
The "user" is a human works with AM interactively, giving it hints, commands,
questions, etc.
Notice that by "new" we mean new to the user, not new to Mankind. 
This might occur if the user were a child, and AM discovered
some elementary facts of arithmetic.
This is not really
so provincial:  mathematicians take "new" to mean new to Mankind, not
new in the Universe.  I feel philosophy slipping in, so this footnote is
terminated. $ Anything new to Mankind?

λλ By the character of the difference between the initial and final states.
Progressing from set theory to number theory is much more impressive than progressing
from two-dimensional geometry to three-dimensional geometry.

λλ By the quality of the route AM took to accomplish these advances:  
How clever, how circuitous,
how many of the detours were quickly identified as such and abandoned?
 
λλ By the character of the User--System interactions: How important is the user's
guidance? How closely must he guide AM? What happens if he doesn't say anything ever?
When he does want to say something, is there an easy way to express that to AM,
and does AM respond well to it?
Given a reasonable kick in the right direction, can AM develop the mini-theories
which the user intended, or at least something equally interesting?

λλ By its intuitive heuristic powers: Does AM believe in "reasonable" conjectures?
How accurately does AM estimate the difficulty of tasks it
is considering?  
Does AM tie together (e.g., as analogous) concepts which are formally unrelated
yet which benefit from such a tie?

λλ By the results of the experiments described in
Section {[2] EXPT}.{[2] EXPTSSEC}, page {[3] EXPTPAGE}.
How fragile is the worth numbering scheme? The priority of tasks scheme?
How domain-specific are those heuristics really? The set of facets?

λλ By the fact that the kinds of experiments outlined in the next section can
easily be "set up" and performed on AM.
Regardless of the experiments' outcomes, 
the features of AM which allow them to be carried
out at all are worthy of note.

λλ By the implications of this project. What can AM suggest about educating
young mathematicians (and scientists in general)?
What can AM say about doing math (about empirical research in general)?

λλ By comparisons to other, similar systems. 

.E

.ONCE TURN ON "{}"

For each of these {BNN} measuring criteria, 
a subsection will now be provided, to illustrate (i) a
stunning acheivement and (ii) a stunning failure of AM along each dimension, and
(iii) to
try to objectively characterize AM's performance according to that measure.

. SSSEC(AM's Ultimate Discoveries)

Two of the ideas which AM Proposed were totally new to the user:

.MAXDIVPAGE: PAGE

.MAXDIVSEC: SECNUM

.MAXDIVSSEC: SSECNUM

.BN

λλ Consider numbers with an abnormally high number of divisors. 
If d(n) represents the number of divisors of n,$$e.g., d(12) = ||α{1,2,3,4,6,12α}||
= 6. $ then AM defines the set of "maximally-divisible numbers" to be
⊗6α{nεN | (∀m<n) d(m)<d(n)α}⊗*. By factoring each such number into primes,
AM noticed a regularity in them. The author then developed a "mini-theory" about
these numbers. It later turned out that Ramanujan had already proposed that very
same definition (in 1915), and had found that same regularity. His results only partially
overlap those of AM and the author, however.

λλ 
⊗1AM found a cute geometric application of Goldbach's conjecture. Given a set of
all angles of prime degree, from 0 to 180↑o,$$Included are 0↑o and 1↑o, as well
as the "typical" primes 2↑o, 3↑o, 5↑o, 7↑o, 11↑o,..., 179↑o. $ then ⊗4any⊗* angle
between 0 and 180 degrees can be approximated to within 1↑o by adding a pair of
angles from this prime set.
In fact, it is hard to find smaller sets than this one which approximate any
angle to that accuracy.

.E

By and large, the other concepts which AM developed were either already-known,
or real losers. Here is one example of a real turkey:

AM composed Set-insert with the predicate Equality (i.e., absolute equality
of two sets). The result was an operation Insert-o-Equal(x,y,z), which
first tested whether x was Equal to y or not. The result of this was either
T or F$$ Actually, in LISP, it was easier to have such results
always be either T or NIL$. Next, this result was inserted into z.
For example, Insert-o-Equal({1,2},{3,4},{5,6})={F,5,6}. The first two
arguments are not equal, so F was inserted into the third.
This is a real loser of an operation, if ever there was one.

Another kind of loser occurred whenever AM entered upon some "regular" behavior.
For example, if it decided that Compose was interesting, it might try to create
some examples of compositions. It could do this by picking two operations and
composing them. What better operations to pick than Compose and Compose!
Thus Compose-o-Compose would be born. By composing that with itself, an even
more monstrous operation is spawned: Compose-o-Compose-o-Compose-o-Compose.
Since AM actually uses the word "Compose" instead of that little infix circle, the
PNAME of the data structure it creates is horrendous. Its use is almost
nonexistent: it must take 5 operations as arguments, and it returns a new
operation which is the composition of those five.

In summary, then, we may say that AM produced a few  ideas
new to the author,
a couple of which
were new to Mankind (although of questionable interest!). Several additional
"new" concepts were created which both AM and the user agreed were better forgotten.
The "level" of AM's fruits
could be classified as a high-school student, although this is
deceptive since AM lacks the ⊗4breadth⊗* of abilities any human being possesses.

. SSSEC(The Magnitude of AM's Progress)

We can ask the following kind of question: how many "levels" did AM progress
along? This is a fuzzy notion, but basically we shall say that a new level is
reached when a valuable new 
bunch of connected concepts are
defined in terms of concepts at a lower level.

For example, AM started out knowing about Sets and Set-operations. 
When it progressed
to numbers and arithmetic, that was one big step up to a new level. When it
zeroed in on primes, unique-factorization, and divisibility, it had moved up another
level.

When fed simple geometry concepts, AM moved up one level when it defined some
generalizations of the equality of geometric figures (parallel lines,
congruent and similar triangles, angles equal in measure) and their invariants
(rotations, translations, reflections).

The above few examples are unfortunately exhaustive: that just about sums up
the major advances AM made.
Its progress was halted not so much by cpu time and space, as by a paucity of
meta-knowledge: heurstic rules for filling in new heuristic rules.
Thus AM's successes are finite, and its failures infinite, along this dimension.

A more charitable view might compare AM to a human who was forced to start from
set theory, with AM's sparse abilities. In that sense, perhaps, AM would rate
quite well. The "unfair" advantage it had was the presence of many heuristics
which themselves were gleaned from mathematicians: i.e., they are like compiled
hindsight. A major purpose of mathematics education in the university is to
instill these heuristics into the minds of the students.

AM is thus characterized as possessing heuristics which are powerful enough
to take it a few "levels" away from the kind of knowledge it began with,
but ⊗4only⊗* a few levels. The limiting factors are (i) the heuristic rules
AM begins with, and more specifically (ii) the expertise in recognizing and
compiling new heuristics, and more generally (iii) a lack of real-world
situations to draw upon for analogies, intuitions, and applications.

. SSSEC(The Quality of AM's Route)

No  matter what  its  acheivements  were,  or the  magnitude  of  its
advancement  from   initial  knowledge,  AM  could  still  be  judged
"unintelligent" if, e.g.,  it were exploring  vast numbers of  absurd
avenues for each worthwhile one it found. The quality of the route AM
followed is thus quite significant.

AM performed  better in this respect than expected, although much
worse than a real mathematician  would have done.  Of the  hundred or
so new concepts it defined,  at least fifty were acceptable -- in the
sense that  one would  defend AM's  reasoning in  at least  exploring
them, that a human mathematician might have considered them. Of these
fifty "winners",  about a dozen were significant  -- that is, useful,
catalytic, well-known by human  mathematicians, etc.   Unfortunately,
the fifty concepts  which were losers were ⊗4real⊗* losers.   In this
respect,  AM fell far  below the standards a  mathematician would set
for acceptable behavior:  ⊗4all⊗* his failures  should have at  least
seemed  promising at  the beginning.   Half  of AM's  adventures were
poorly grounded, and (perhaps due to a lack of intuition) AM bothered
with concepts which were "obviously" trivial: the set of even primes,
the set of numbers with only one divisor, etc.

Once  again  we must  observe  that the  quality  of the  route  is a
function of the quality of the heuristics.  If there are  many clever
little rules,  then the  steps AM  takes will  often seem  clever and
sophisticated.   If the rules superimpose nicely, joining together to
collectively buttress some specific activity, then they may even seem
surprisingly effective to their creator.

Such moments  of great insight (i.e., where  AM's reasoning surpassed
mine) did occur, although rarely. Both of AM's "biggie discoveries"  started
by its  examining concepts I  felt weren't  really interesting.   For
example,  I  didn't like  AM  spending so  much  time  worrying about
numbers with many  divisors; I  "knew" that the  converse concept  of
primes was infinitely more valuable. And yet AM saw no reason to give
up  on maximally-divisible numbers;  it had several  good reasons for
continuing that inquiry (they  were the converse to primes  which had
already proved  interesting, their frequency within  the integers was
neither very high nor very low nor very regular, their definition  was
simple,   they   were  extremals   of   the   interesting   operation
"Divisors-of",  etc.,  etc.)  Similarly,  I  "knew"  that  Goldbach's
conjecture was useless, so I was unhappy that AM was bothering to try
to apply it in  the domain of geometry.  In  both cases, AM's reasons
for  its actions were unassailable, and in  fact it did discover some
interesting new ideas both times.

Sometimes  AM's  behavior was  displeasing,  even  though  it  wasn't
"erring".      Occasionally it was  simultaneously   developing   two
mini-theories (say primes and  maximally-divisibles).  Then it  might
pick a task or two  dealing with one of these topics, then  a task or
two dealing with the other topic, etc. The task picked at each moment
would be the one  with the highest priority value.   As a theory   is
developed,  the interestingness  of its  associated tasks  go up  and
down;  there may be doldrums for a  bit, just before falling into the
track that  will lead to  the discovery  of a valuable  relationship.
During these  temporary lags, the interest value  of tasks related to
the other theory's concepts will appear to  have a higher priorty value:  i.e.,
better reasons supporting it. So AM would  then skip over to one of ⊗4those⊗*
concepts, develop it  until ⊗4its⊗* doldrums, then return to the first
one, etc.  Humans found this behavior unpalatable$$ Although it might
be the  "best" from a dynamic  management point of view,  it probably
would be  wrong in the long run.  Major advances really do have lulls
in their development. $ because  AM had no compunction  about skipping
from one  topic to another. Humans  have to retune their  minds to do
this skipping, and therefore treat it  much more seriously.  For  that
reason, AM was given an extra mobile reason  to use for certain tasks
on its  agenda: "focus of attention". Any task  with the same kind of
topic as the ones just executed  are given this extra reason, and  it
raises their priority values a little.   This was enough sometimes to
keep AM working on a certain mini-theory when it otherwise would have
skipped somewhere else.

The above "defect"  is a  cute little kind  of behavior AM  exhibited
which was non-human but  not clearly "wrong".  There were ⊗4genuine⊗*
bad moments also, of  course. For example,  AM became very  excited$$
This anthropomorphism must  be excused. Technically, we may  say that
the  priority valueof  the best job  on the  agenda is the  "level of
excitement" of AM. 1000 would  be ectasy, 100 boredom, and 0  sensory
deprivation. $ when the conjunction of "empty-set" and other concepts
kept being equal to empty-set. AM kept repeating conjunctions of this
form, rather  than stepping back  and generalizing  this data into  a
(phenomenological)  conjecture.     Similar  blind  looping  behavior
occurred when AM kept composing  Compose with itself, over and  over.
In general, one could say that "regular" behavior of any kind signals
a  probable fiasco. A  heuristic rule to  this effect  halted most of
these disgraceful antics.  This rule had to be careful, since  it was
almost the antithesis  of the "focus of attention"  idea mentioned in
the preceding  paragraph.  Together, those two rules seem to say that
you should continue  on with the kind  of thing you were  just doing,
but not for ⊗4too⊗* long a time.

The moments of insight were 2 in number; the moments of stupid
misdirection were about twenty times as many.

AM  had   very  few  heuristics  for  deciding   that  something  was
⊗4un⊗*interesting, that  work  on it  should  halt for  a  long  time.
Rather, AM simply would not have anything  positive to say about that
concept, and other concepts would be explored instead, essentially by
default.  Each  concept had a worth  component which corresponded  to
its right to life (its right  to occupy storage in core). This number
slowly   declines  with  time,  and   is  raised  whenever  something
interesting happens  with that  concept.  If  it ever  falls below  a
certain  threshhold, the  concept is  forgotten: its  list  cells are
garbage collected, and all  references to it  are erased, save  those
which  will  keep it  from  being  re-created.   This  again  is  not
purposeful forgetting, by rather by default; not because X is seen as
a dead-end,  but simply  because  other concepts  seem so  much  more
interesting for a long time.

Thus AM did  not develop the fifty "losers" very  much: they ended up
with  an average of only 1.5 tasks  relevant to them ever having been
chosen.   The  "winners" averaged  about twice  as  many tasks  which
helped fill  them out more.   Also, the  worth ratings of  the losers
were far below  those of the  winners.   So AM really  did judge  the
value of its new concepts quite well.

The final  aspect of this  important dimension  of evaluation is  the
quality of the  reasons AM used to support each  task it chose to work
on.   Again, the  English phrases  corresponded quite  nicely to  the
"real" reasons a human might give  to justify why something was worth
trying,  and the ordering of  the tasks on the  agenda was rarely far
off from the  one that I would  have picked myself. This  was perhaps
AM's greatest success: the rationality of its actions.

. SSSEC(The Character of the User-System Interactions)

AM is not a "user-oriented" system. There  were many nice features in
the  proposal for AM  which never got  off the drawing  board. At the
heart of these features were two assumptions:

.BN

λλ The user  must understand  AM, and  AM must likewise  have a  good
model of the particular human using  him. The only time either should
initiate  a message  is when his  model of the  other is  not what he
wants  that model  to  be.   In  that  case,  the message  should  be
specificly designed to fix that discrepancy.$$ This idea was motivated
by a lecture given in 1975 by Terry Winograd$

λλ  Each kind of  message which is  to pass  between AM and  its user
should have its  own appropriate language.   Thus there  should be  a
terse comment language, whereby the user can  note how he feels about
what AM is doing, a questioning language for either party to get/give
reasons to the  other, a picture  language for communicating  certain
relationships, etc.

.E

Neither of these  ideas ever made it  into the LISP code  that is now
AM,  although they are  certainly not  prohibited in any  way by AM's
design. It would  be a separate project,  at the level of  a master's
thesis I expect, for someone  to build a nice user interface for AM$$
I am not actually calling for this to be done, merely indicating  the
magnitude of the effort involved. A VERY nice user interface might be
much harder, at the level of a dissertation. $.

As one  might expect, the  reason for this  atrophy is simply because very
little guidance  from  the   user  was  needed  by  AM.  In  fact,  all  the
discoveries, cpu  time quotes, etc.  mentioned in  this document  are
taken from totally unguided runs by AM. If the user guides as well as
he can, then about a factor of 3 speedup is possible. Of course, this
assumes that  the  user  is dragging  AM  directly  along a  line  of
development he knows will be successful. The user's "reasons" at each
step are based  essentially on  hindsight. Thus  this is  not at  all
"fair". If AM  ever becomes more  user-oriented, it would be  nice to
let children (say 6-12 years old)
experiment  with it,  to  observe them  working with  it in
domains unfamiliar to either of them.

.ONCE TURN ON "{}"

The user  can  "kick"  AM  in  one direction  or  another,  e.g.,  by
interrupting  and telling  AM  that Sets  are  more interesting  than
Numbers$$  To  actually do  this,  the  user will  type  control-I to
interrupt AM.  He  then types  I, meaning  "alter  the interest  of",
followed  by the  word "Sets".  AM then  asks whether  this is  to be
raised or lowered. He types back R,  and AM asks how much, on a  1-10
scale. He  replies  9, say,  and then  repeats this  process for  the
concept "Numbers".  $.  In that particular  case, AM fails to develop
any higher-level set concepts (diagonalization, infinite sets,  etc.)
and simply  wallows around  in finite set  theory (de  Morgan's laws,
associativity  of Union, etc.).   When geometric  concepts are input,
and AM  is  kicked  in ⊗4that⊗*  direction,  much nicer  results  are
obtained. See Experiment 5, page {[3] GEOEX}.

<<Should we include mention of the verbosity level and expertise-level features?>

<<Include an evaluation of the human engineering features (and humans' reactions)?>

There is one important result to observe: the very best examples of
AM in action were brought to full fruition only by a human developer.
That is, AM thought of a couple great concepts, but couldn't develop them
well on its own. A human (the author) then took them and worked on them
by hand, and interesting results were acheived. These results could be
told to AM, who could then go off and look for new concepts to investigate.
This interaction is of course at a much lower frequency than the kind
of rapidfire question/answering talked about above. Yet it seems that
such synergy may be the ultimate mode of AM-like systems.

. SSSEC(AM's Intuitive Powers)

Let me hasten to mention that the word "intuitive" in this subsection's
title is not limited to the
Intuition facets  of the concepts.  What is  meant is the totality of
plausible  reasoning  which  AM  engages  in:  empirical   induction,
generalization, specialization,  maintaining reasons  for jobs  on the
agenda list, creation of analogies between bunches of concepts, etc.

AM  only considers conjectures which  have been explicitly suggested:
either by empirical evidence, by analogy, or (de-implemented now:) by
Intuition facets.   Once a conjecture has  been formulated, it  is tested in
all ways possible:  new experimental evidence  is sought  (especially
extreme cases),  it is examined  formally to see  if it  follows from
already-known conjectures, etc.

Because of  this grounding in plausibility,  the only conjectures the
user ever sees  (the ones AM  is testing) are  quite believable.   If
they  turn out  to be  false,  both he  and  AM are  surprised.   For
example, both AM and the user were disappointed when nothing came out of the
concept of  Uniquely-prime-addable numbers  (positive integers  which
can be  represented as the sum  of two primes in  precisely one way).
Several conjectures  were  proposed  via analogy  with  unique  prime
factorization, but  none of  them held  experimentally. Each of  them
seemed worth investigating, to both the user and the system.

AM's estimates  of the value of each task  it attempts were often far
off from what hindsight proved their true values to be. Yet  this was
not so different  from the situation a real researcher  faces, and it
made little difference on the discoveries and failures of the system.
AM  occasionally mismanaged  its  resources  due to  errors  in  these
estimates.   To  correct for  such erroneous  prejudgments, heuristic
rules were permitted to dynamically  alter the time/space quanta  for
the current task. If some interesting new result turned up, then some
extra resources would  be alotted. If certain heuristics failed, they
could reduce the time limits, so not as much total cpu time  would be
wasted on this turkey.

An example of  a nice conjecture is the unique  factorization one.  A
nice  analogy was the one between  angles and  numbers (leading  to the
application  of Goldbach's  conjecture).  Another  nice  analogy  was
between numbers  and bags (and hence between  bag-operations and what
we commonly call arithmetic operations).

Some poor analogies were  considered, like the  one between bags  and
singleton-bags.  The ramifications  of  this analogy  were  painfully
trivial$$ The  bag-operations, applied to singletons, did not produce
singletons  as  their  result:  (x)∪(y)  is  (x,y)  which  is  not  a
singleton. Whether they  did or not depended only on  the equality or
inequality  of the  two arguments.  There were many  tiny conjectures
proposed which merely re-echoed this general conclusion. $.

. SSSEC(Experiments on AM)

.ONCE TURN ON "{}"

The experiments described in Section {[2] EXPT}.{[2] EXPTSSEC}, page
{[3] EXPTPAGE} provide  some results relevant to the overall value of
the AM system.  The  reader should consult that section for  details;
neither the experiments  nor their results will be repeated  here.  A
few  conclusions will be  summarized, to  show that AM  fared well in
this dimension of evaluation.

The worth-numbering scheme  for the concepts  is fairly robust:  even
when all the concepts's worths  are intialized at the same value, the
performance  of  AM  doesn't  collapse,  although  it  is   noticably
degraded.

Certain mutilations of the priority-value scheme  for the agenda will
cripple  AM, but  it can resist  most of  the small changes  tried in
various experiments.

The heuristics are specific to their stated domain  of applicability.
Thus when working in geometry,  the Operation heuristics were just as
useful as they were when AM worked in elementary set theory or number
theory. The set of facets seemed adequate for those domains, too. The
Intuition facet, which was  rejected as a valid source of information
about sets and numbers, might have been more acceptable in  geometry
(e.g.,  something  similar  to  Gelernter's   model  of  a  geometric
stuation).

All in all, then,  we conclude that AM was fairly tough, and about as
general as its heuristics claimed  it was. AM is not invincible, infallible, or
universal.   Its strength lies  in careful  use of
heuristics. If there aren't enough domain-specific heuristics around,
the system  will simply not  perform well  in that  domain.  If  the
heuristic-using  control structure  of  AM is  tampered  with$$ e.g.,
treat all reasons  as equivalent,  so you  just COUNT  the number  of
reasons a task has, to determine its priority  on the agenda.$,
there  is some chance of  losing vital guiding  information which the
heuristics would otherwise supply.

. SSSEC(How to Perform Experiments on AM)

.ONCE TURN ON "{}"

The very  fact that the  kinds of experiments  mentioned in  the last
section  (and   described  in  detail  in  Section  
{[2]EXPT}.{[2]EXPTSSEC}, 
page {[3]EXPTPAGE}) can easily be "set up"  and performed
on AM, reflects a nice quality of the AM program.

Most of those experiments  took only a few minutes to  set up, only a
few  tiny modifications  to AM.  For example,  the one where  all the
Worth ratings were intialized to the same value was done by evaluating
the single LISP expression:

.B816

(MAPC CONCEPTS '(λ (c) (PUT c 'Worth 200)))

.E

Similarly, here is how AM  was modified to treat all tasks as if they
had equal value: the function Pick-task has a statement of the form

.B816

(SETQ Current-task (First-member-of Agenda))

.E

All that  was necessary  was  to replace  the  call on  the  function
"⊗6First-member-of⊗*"$$   In   LISP,  this   function   is   actually
abbreviated "CAR". $ by the function "⊗6Random-member-of⊗*".

Even  the most  sophisticated experiment, the  introduction of  a new
bunch of  concepts  --  those  dealing with  geometric  notions  like
Between, Angle, Line -- took only a few hours to set up.$$ I had been
thinking  about this  particular  experiment off  and on  for several
months. When I started to code the new concepts, I'd probably already
thought out what they should be fairly clearly. $

There  are  certain  experiments  one  can't easily  perform  on  AM:
removing all  its  heuristics,  for example.  Most  heuristic  search
programs would  then  wallow around,  displaying just  how big  their
search space really was. But AM would just sit there, since it'd have
nothing plausible to do.

Many other  experiments, while cute  and easy  to set  up, are  quite
costly in terms of cpu time. For example, the class of experiments of
the  form: "remove heuristics  x, y, and z, and  observe the resultant
affect on AM's behavior".   This observation would entail  running AM
for an hour or two of cpu time!  Considering the number of subsets of
heuristics, not all these questions are going to get answered in  our
universe's lifetime. Considering  the small  probable payoff  from any one  such
experiment, very few should actually be attempted.

Most of the experiments one could think of can be quickly set up
-- but only by someone familiar with the LISP code of AM.
It would be quite hard to modify AM so that the untrained user could
easily perform these experiments. Essentially, that would demand that
AM have a deep understanding of its own structure. This is of course
desirable, fascinating, challenging, but wasn't part of the design of
AM.

. SSSEC(Future Implications of this Project)

One harsh measure of AM would be to demand what possible applications
it will have. This  really means (i) the uses for the AM system, (ii)
the  uses for  the  ideas  of  how  to  create  such  systems,  (iii)
conclusions about math and science one can draw from experiments with
AM.

Here are some of these implications, both real and potential:

<<MAYBE:   Separate   into   3   lists:   definite/probable/potential
applications.>

.BN PREFACE 1 INDENT 0,3,0

λλ  New tools  for  computer  scientists  who want  to  create  large
knowledge-based systems to emulate any creative human activity.

.BEGIN PREFACE 0 INDENT 8

⊗61a.⊗* The  modular representation of  knowledge that AM  uses might
prove to be effective in any  knowledge-based system.  Division of  a
global problem into a multitude of small chunks,  each of them of the
form  of setting up  one quite local  "expert" on some  concept, is a
nice way  to make  a hard  task more  managable.   Conceivably,  each
needed expert could be  filled in by a human who  really is an expert
on that topic.  Then the global abilities of the system would be able
to rely  on quite  sophisticated local  criteria.   Fixing  a set  of
facets once and for all permits effective inter-module communication.

⊗61b.⊗* The  ideas about heuristics having  domains of applicability,
of tacking  them onto  the most  general knowledge  source  (concept,
module) they are relevant to, rippling, etc., may all be carried over
unchanged  into  many  fields  of  human  creativity, wherever  local
guiding rules exist.

.E

λλ A body of heuristics which can be built upon by others.

.BEGIN PREFACE 0 INDENT 8

⊗62a.⊗* Most  of  the particular  heuristic judgmental  criteria  for
interestingness,  utility,   etc.,  might  be  valid   in  developing
theorizers  in other sciences.  Recall that  each rule has its domain
of applicability; many of the heuristics in AM are quite general.

⊗62b.⊗* Just within the small  domain AM already works in,  this base
of  heuristics  might   be  enlarged  through  contact  with  various
mathematicians. If they  are willing  to introspect and  add some  of
their "rules" to AM's existing base, it might gradually grow more and
more powerful.

.E

λλ New and better strategies for math educators.

.BEGIN PREFACE 0 INDENT 8

⊗63a.⊗*   If  the  repertoire  of   intuition  (simulated  real-world
scenarios) were sufficient for  AM to develop elementary concepts  of
math, then educators should ensure  that children (4-6 years old) are
thoroughly exposed to those scenarios.  Such activities would include
seesaws,  slides,  piling marbles  into  pans  of  a  balance  scale,
comparing the heights of  towers built out of cubical blocks, solving
a jigsaw puzzle, etc.  Unfortunately, AM failed to show the value  of
these few scenarios.  This was a  potential application which was not
confirmed.

⊗63b.⊗* Since the  key to AM's success sems to be its heuristics, and
not the  particular  concepts  it  knows, the  whole  orientation  of
mathematics education should be  modified, to provide experiences for
the  student which will build up these  rules in his mind. Learning a
new theorem is  worth much less than  learning a new heuristic  which
lets you discover  new theorems. I am no doubt  far from the first to
urge such a revision.

⊗63c.⊗* A use for AM itself would  be as a "fun" teaching tool. If  a
nice user interface  is constructed, AM  could serve as a  model for,
say,  college freshmen with  no math research  experience. They could
watch AM, see the kinds of things it does, play with it,  and perhaps
get a real flavor  for (and get turned on by)  doing math research. A
vast  number of  brilliant minds  are too  turned off  by high-school
drilling and college calculus to stick around long enough to find out
how  exciting  --  and different  --  research  math  is compared  to
textbook math.

.E

λλ Further experiments on  AM might tell us  something about how  the
theory formation  task changes as  a theory grows  in sophistication.
For example, can  the same methods which lead AM from premathematical
concepts to  arithmetic  also  lead  AM from  number  systems  up  to
abstract  algebra?   Or are  a new  set of  heuristic rules  or extra
concepts  required?   My  guess is  that a  few  of each  are lacking
currently, but ⊗4only⊗* a few.  There is a great deal of disagreement
about this subject among mathematicians.
By tracing along the development of mathematics, one might categorize
discoveries by how easy they would be for an AM-like system to find.
Sometimes, a discovery required the invention of a brand new heuristic
rule, which would clearly be beyond AM as currently designed.
Sometimes, discovery is based on the lucky random combination of
existing concepts, for no good ⊗4a priori⊗* reason. It would be
instructive to find out how often this is necessarily the case: how
often ⊗4can't⊗* a mathematical discovery be motivated and "explained"
using heuristic rules of the kind AM possesses?

λλ An  unanticipated result was  the creation of  new-to-Mankind math
(both  directly  and  by   defining  new,  interesting  concepts   to
investigate by hand).   An even  more exciting prospect,  which never
materialized,  was that AM  would find  a new redivision  of existing
concepts, an alternate formulation  of some established theory,  much
like Hamiltonian  mechanics is an  alternate unification of  the data
which  led  to Newtonian  mechanics.   Since  AM is  a  collection of
heuristic rules, it could  be augmented by several  individuals. This
kind  of system  could  then conceivably  integrate  their particular
insights, and do quite spectacular mathematics. This is still in  the
realm of science fiction, of course.

.E

. SSSEC(Comparison to Other Systems)

<<Perhaps shorten this "comparison" section>

One popular way to judge a system is to  compare it to other, similar
systems, and/or to  others' proposed criteria for such systems. There
is virtually  no similar  project  known to  the author,  despite  an
exhaustive  search (weigh  the  Bibliography).   A couple  tangential
efforts  will be mentioned, followed  by a discussion  of how AM will
measures up  to the  "understanding-program" standards  set forth  by
Moore and Newell in their MERLIN paper.

Several projects have been undertaken which comprise a small piece of
the proposed system,  plus deep  concentration on  some area  ⊗4not⊗*
under study here.  Boyer and Moore's  theorem-prover embodies some of
the  spirit of this  effort (e.g.,  generalizing the definition  of a
LISP  function),  but  its  motivations  are  quite   different,  its
knowledge base is  minimal, and its methods purely  formal.$$ This is
not  meant as criticism; considering the  goals of those researchers,
and the age of that system, their work is quite  exciting.  $ Badre's
CLET system worked on learning the decimal addition algorithm$$ Given
the addition table up to 10 + 10, plus an English text description of
what it means to carry, how and when to carry, etc., actually write a
program   capable  of   adding  two   3-digit   numbers  $   but  the
"⊗4mathematics discovery⊗*"  aspects  of  that  system  were  neither
emphasized  nor  worth emphasizing;  it  was  an interesting  natural
language  communication study.   The same comment  applies to several
related studies by IMSSS$$See [Smith], for example.$.

Many researchers  have constructed programs  which pioneered some  of
the techniques  AM uses$$ In  many cases, those  techniques were used
for the first time, hence were thought of as "tricks". $.   Gelernter
used prototypical  examples  as analogic  models to  guide search  in
geometry, and  Bundy has used "sticks" to  help his program work with
natural numbers.  Kling has studied the single heuristic  of analogy,
and Brotz  has written  a system  which uses  this to  propose useful
lemmata;  both  of these  computer  programs  are set  up  as theorem
provers, again not as discoverers.

One aspect  that each  of these  systems  lacked was  size: they  all
worked  in tiny  toy domains,  with miniscule,  carefully prearranged
knowledge bases, with just enough information to do the job well, but
not so much that the  system might be swamped. AM is open  to all the
advantages and  all the dangers of a  non-toy system with a massive$$
by contemporary standards.  This data base will no doubt seem pitiful
a generation from now. $ corpus of data to manage.  The other systems
did   not  often  contain  multiple   representations  for  the  same
knowledge, and they didn't rely on a large base of heuristic rules to
propose  plausible   tasks  to  work  on,  and   their  behavior  was
problem-solving rather than theory formation.  They were  goal-driven
(the topmost goal being a desired theorem  to prove).  Certainly none
has  considered the  paradigm of  ⊗4discovery  and evaluation  of the
interestingness of structure⊗*;  the others have  been "here is  your
task, try  and prove it," or,  in Badre's case, "here  is the answer,
try and translate/use it."

One  exception to  this will be  Paul Cohen's  math discovery system,
just entering the planning stages at the moment. The emphasis of that
program  will be on  formal proof,  on random logical  connections of
existing concepts (theorems) into new ones, interspersed with periods
of  critical evaluation  and  pruning. Heuristic  rules  will not  be
present  to propose  new concepts  or future tasks.   This  design is
similar to that of the early "learning by evolution" programs.

Theory formation systems in ⊗4any⊗* field have been few. Meta-dendral
represents the  best of these.  Its task is to  unify a body  of mass
spectral  data, examples of "proper" identifications of spectra, into
a small  body of  rules for  making identifications.  Thus even  this
system  is  given  a  fixed  task,  a  fixed  set  of  data  to  find
regularities within.  AM, however, must  find its own data, and  take
the responsibility  for managing its  own time,  for not looking  too
long  at worthless data (e.g.,  maximally-divisible numbers, which we
all know have no regularity!).

Besides "math systems", and "creative thinking systems",  and "theory
formation systems",  we should at  least discuss others'  thoughts on
the  issue of algorithmically doing math  research.  Some individuals
feel it  is not  so  far-fetched to  imagine automating  mathematical
research  (e.g., Paul Cohen).   Others  (e.g., Polya)  would probably
disagree.

There is very little thought  about discovery in mathematics from  an
algorithmic  point  of  view; even  clear  thinkers  like  Polya  and
Poincare'  treat  mathematical  ability as  a  sacred,  almost mystic
quality, tied to the unconscious.   The writings of philosophers  and
psychologists  invariably attempt  to examine  human performance  and
belief,  which are far more manageable  than creativity ⊗4in vitro⊗*.
Belief  formulae   in  inductive   logic   (eg.,  Carnap,   Hintikka,
Pietarinin)  invariably  fall  back  upon  how well  they  fit  human
measurements.   The abilities  of  a computer  and  a brain  are  too
distinct  to  consider   blindly  working  for  results   (let  alone
algorithms!) one possesses which match those of the other.

Preceding  subsections have discussed  many criteria  for the system.
Moore and Newell have published some reasonable design issues for any
proposed understanding  system, and we  shall now see how  our system
answers  their  questions$$ Each  point  of the  taxonomy  which they
provide before these questions is covered by the AM system.$.  Recall
that a BEING is the name of the kind of knowledge module representing
one concept, the data  structure corresponding to  a bunch of  facets
about that concept.

.BN

λλ  Representation: complex  BEINGs,  simple situation/action  rules,
opaque functions.  Each BEING represents one very specialized expert,
one mathematical  concept.   Partial  knowledge about  a topic  X  is
naturally expressed as an incomplete BEING X.  Each differently-named
facet has its own format.

λλ  Action:  Most knowledge  is  stored in  facets of  concepts  in a
nearly-executable way; the remainder  is stored so that  the "active"
segment can  easily use it as it  runs.  The place that  any piece of
information is stored is carefully chosen  so that it will be  evoked
just in "fitting" situations.  The only real  action in the system is
the  selective completion  of concepts'  facets, with  the occasional
creation of a new concept.

λλ Assimilation: There is  no sharp distinction between the  internal
knowledge and the system's goal; the goal is really nothing more than
to  extend the  given knowledge  while maintaining both  the priority
valueof  the current  task  and  the worth  values  of  newly-created
concepts.  The only external  entities are the user and the simulated
physical world.   Contact with  the first is  through a  simpleminded
translation  scheme, with  the latter  through  evaluation of  opaque
functions on observable data and examination of the results.

λλ Accomodation: this is not exhibited to any high degree by AM.

λλ Directionality: Relevant  knowledge is gathered up at each step to
satisfy the current  task chosen from  the agenda.   This is done  by
rippling  away from  the concept  mentioned in  that task.   At  each
stage, there  will be thousands of unfilled-in facets, and the system
simply chooses the most interesting one to work on.

λλ Efficiency: The contents of the facets exist both in compiled form
and in inspectable form.  Communication with a human user takes place
very rarely, and is very  "clean" when it does  occur, so it isn't  a
bottleneck.   AM  is  an  informal  system, relying  on  a  tentative
calculus of interestingness,  worth numbers, and priority values.  At
the moment, there are no concepts who are experts on Bugs, Debugging,
Contradiction,  etc.   AM  ignores the  frame  problem, and  resolves
paradoxes at contradiction-time.

λλ  Depth of Understanding: Each concept  is an expert. His knowledge
consists of rules  which trigger in  appropriate situations.  AM  has
good  abilities  to add  to  facets  of  existing concepts;  mediocre
abilities to synthesize new concepts; limited abilities to manipulate
and create new heurstic rules.

.E

.SSEC(Capabilities and Limitations of AM)

The first subsection contains a general discussion of what AM can and
can't do.  Later subsections deal with  powers and limitations inherent in using
an agenda scheme, in fixing the domain of AM, and in picking one specific
model of math research to build AM upon.
Finally, some speculation is made  about the ultimate
powers and weaknesses of any systems which are designed very much like AM.

. SSSEC(Current Abilities)

⊗4What fields has AM worked in so far?⊗* AM  is now able to explore a
small  bit of  the theory  of  sets, data  types, numbers,  and plane
geometry.  It by no means has been fed -- nor has it  rediscovered --
a large fraction of what is known in any of those fields. It might be
more  accurate to be humble and  restate those domains as: elementary
finite set  theory, trivial  observations  about four  kinds of  data
types,  arithmetic  and elementary  divisibility  theory, and  simple
relationships  between   lines,   angles,   and   triangles.   So   a
sophisticated concept in each domain -- which was discovered by AM --
might  be: de  Morgan's laws,  Delete-o-Insert  never alters  Bags or
Lists, unique factorization, similar triangles.

⊗4Can AM work in a new field,  like politics?⊗* AM can work in a  new
elementary, formalized  domain, if it  is fed a supplemental  base of
conceptual primitives for that domain.  To work in plane geometry, it
sufficed to give AM about twenty new primitive concepts,  each with a
few parts filled  in. Another domain which AM could  work in would be
elementary mechanics.

⊗4Can AM discover X? Why didn't  it...?⊗* It is difficult to  predict
whether AM  will (without modifications)  ever make a  specific given
discovery.  Although  its capabilities are small, its limitations are
hazy. What makes  the matter even  worse is that,  given a concept  C
which AM missed discovering, there is probably a reasonable heuristic
rule which is missing from AM, which would enable that discovery. One
danger in this  "debugging" is that a  rule will be added  which only
leads  to that  one desired  discovery, and  isn't good  for anything
else. This  must be  avoided at  all  costs, ⊗4even  at the  cost  of
intentionally giving up a certain discovery.⊗* If  the needed rule is
general  -- it has  many applications  and leads to  many interesting
results -- then it really was  an oversight not to include it in  AM.
Although I believe  that there are not too many  such omissions still
within  the  small  realm AM  works,  there is  no  objective  way to
demonstrate that, except by further long tests with AM.

.ONCE TURN ON "{}"

⊗4In what  ways are new  concepts created?⊗*  Although the answer  to
this  is   accurately  given  in   Section  {[2]HEURS}.{[2]CREATECON}
(namely, this  is  mainly the  jurisdiction  of the  right  sides  of
heuristic rules),  and although I  distaste the simple-minded  way it
makes  AM sound, the list  below does characterize the  major ways in
which new concepts get born:

.BN SELECT 6

Fill in examples of a  concept (e.g., by instantiating its  recursive
definition)

Create a  generalization of a  given concept (e.g., by  weakening its
definition)

Create a  specialization of a given concept (e.g., by restricting its
domain/range)

Compose  two   operations  f,g,   thereby  creating  a   new  one   h
[h(x)≡f(g(x))]

Coalese an operation f into a new one g [g(x)≡f(x,x)]

Permute the order of the arguments of an operation [g(x,y)≡f(y,x)]

Invert an operation [g(x)=y  iff f(y)=x] (e.g., from Squaring, create
Square-rooting)

Canonize one  predicate P1  with respect  to a  more general  one  P2
[create  a new  concept  f,  an  operation, such  that:  P2(x,y)  iff
P1(f(x),f(y))]

Create  a new operation  g, which is  the repeated  application of an
existing operation f.

Trivial logical  combinations of existing  concepts x,y:  ⊗6x∧y, x∨y,
¬x,⊗* etc.

.E

Below  is  a similar  list,  giving  the  primary  ways in  which  AM
formulates new conjectures:

.BN SELECT 6

Notice that concept C1 is really an example of concept C2

Notice   that   concept   C1   is   really   a   specialization  (or:
generalization) of C2

Notice that C1 is equal to C2; or: ⊗4almost always⊗* equal

Notice that C1  and C2 are  related by some  known concept (e.g.,  if
operation C3 is run on an argument which is a C1, then its value will
always be a C2)

If an analogy exists, try to extend it to the conjectures as well.

.E SELECT 1

In summary, we can say that AM has acheived its original purpose: to be
guided successfully by a large set of local heuristic rules, in the
discovery of new mathematical theories.
Besides creating new  concepts and noticing  conjectures, AM has  the
key  "ability" of appearing to rationally decide  what to work on
at  each  moment.   This  is  a  result of  the  agenda  of tasks  --
containing associated reasons.  Of course all of these abilities stem
from the  quality and the  quantity of local heuristic  rules: little
plausible move generators and evaluators.

. SSSEC(Current Limitations)

Below are several shortcomings of AM, which hurt its behavior but are
not believed to be
inherent  limitations of its design. They  are presented in order
of decreasing severity.

Perhaps the most  serious limitation on  AM's current behavior  arose
from the  lack of constraints  on left  sides of heuristic  rules. It
turned  out that this excessive  freedom made it  difficult for AM to
inspect and analyze  and synthesize its  own heuristics; such a  need
was not forseen at the time AM was designed.  It was thought that the
power to manipulate heuristic rules  was an ability which the  author
must have, but which the system wouldn't require.   As it turned out,
AM did  successfully develop new concepts  several levels deeper than
the ones it  started with. But  as the new  concepts got further  and
further  away  from those  intial  ones,  they  had fewer  and  fewer
specific  heuristics filled in (since they had  to be filled in by AM
itself).  Gradually, AM found itself relying on heuristics which were
very  general compared  to the  concepts it  was dealing  with (e.g.,
forced to use  heuristics about Objects  when dealing with  Numbers).
Heuristics for  dealing with  heuristics do  exist, and their  number
could  be  increased.  This  is  not  an  easy  job:  finding  a  new
meta-heuristic is a tough process.  Heuristics are rarely more than
compiled hindsight; hence it's difficult to create new ones "before the fact".

.ONCE TURN ON "{}"

AM  has  no  notion  of Proof,  proof  techniques,  formal  validity,
heuristics  for finding counterexamples,  etc.  Thus  it never really
establishes any conjecture formally.  This could probably be remedied
by  adding  about  25 new  concepts  (and  their  100 new  associated
heuristics) dealing with such topics.  The needed concepts have  been
outlined on paper (and are present in Appendix {[2]ALLCON}). It would
probably require a few hundred hours to code and debug them.

The  user  interface is  quite  primitive, and  this  again  could be
dramatically improved with just a  couple hundred hours' work.   AM's
explanation  system  is  almost  nonexistent: the  user  must  ask  a
question  quickly, or AM will have  already destroyed the information
needed to  construct  an  answer. A  clean  record of  recent  system
history and a nice scheme for tracking down reasons does not exist at
the level which is found, e.g., in MYCIN.  There is no trivial way to
have  the system  print  out  its heuristics  in  a format  which  is
intelligible to the untrained user.

An  important  type of  analogy which  was  untapped by  AM  was that
between heuristics.  If two situations were similar,  conceivably the
heuristics useful  in one situation  might be useful (or  have useful
analogues)  in the new  situation.  Perhaps  this is a  viable way of
enlarging the  known heuristics.   Such "meta-level" activities  were
kept  to a minimum  throughout AM,  and this proved  to be  a serious
limitation. My intuition  tells me  that the  "right" ten  meta-rules
could correct this particular deficency.

Several limitations arose from the constraints  of the agenda scheme,
from the  choice of finite set  theory as the domain  to work in, and
from the particuar model of math research that was postulated.  These
will  be discussed  in  the next  3
subsections.

. SSSEC(Limitations of the Agenda scheme)

.COMMENT Should these DIFlabels go in the sec's header?;

.DIFSECNUM: SECNUM;

.DSEC: SECNUM;

.COMMENT Not sure about  that last label-- it might not  be meant for
.here;

.DIFSSECNUM: SSECNUM;

.DIFSSEC: SSECNUM;

.DIFSEC: PAGE;

The  following quibbles  with  the agenda  scheme get  less  and less
important.  When you get bored, skip to the next subsection.

Currently, it is difficult to include heuristics which  interact with
one another  in any significant  way. The  whole fibre of  the Agenda
scheme assumes perfect independence of heuristics. The global formula
used to  rate tasks on  the agenda  assumes perfect superposition  of
reasons:  there are  no  "cross-terms".   Is  this assumption  always
valid? Unfortunately ⊗4no⊗*, not even  for the limited domain AM  has
explored.  Sometimes, two reasons are very similar: "Examples of Sets
would permit  finding examples of Union" and  "Examples of Sets would
permit finding  examples of Intersection".  In that  case, their  two
ratings shouldn't cause  such a big increase in  the overall priority
value of the task "⊗6Fillin examples of Sets⊗*".

Sometimes, a heuristic rule will want to ⊗4dissuade⊗* the system from
some activity. Thus a  ⊗4negative⊗* numeric contribution to  a task's
priority  value is  desired.  This is  not figured  into  the current
scheme. With a slight modification, the global formula could preserve
the sign (signum) of each reason's rating,.

Tasks on the agenda list are ordered by their numeric priority value.
Each  reason's numeric  value is  kept,  too.   When new  reasons are
added, these  values are used  to recompute  a new  priority for  the
task.  Each  reason's rating was computed by a  little formula fouund
inside  some  heuristic rule.  Those  formulae are  not  kept hanging
around.  One  big  improvement  in  apparent  intelligence  could  be
attained by tacking  on those little formulae to  the reasons. When a
new reason  is  added, the  old  reasons' rating  formulae  would  be
evaluated again. They  might indeed give  new numbers.   For example,
suppose  one reason was  "Few examples of  X are known".  But by now,
other tasks have meanwhile  inadvertantly filled in several  examples
of X.   Then that little reason's  formula would come up  with a much
lower  value than it did  originally. In fact, the  value might be so
low that  the reason  was dropped  altogether. If  the formulae  were
kept, it might  be good practice to evaluate them  for the top two or
three tasks  on  the  agenda,  to see  if  they  might  change  their
ordering. Also, the top task's priority would  then be more accurate,
and recall that its  value is used to determine the cpu time and list
cell space quanta that the task is allowed to use up. At  the moment,
AM is not set up to store the little functions, and if modified to do
so,  it uses up a lot  more space than it can  afford.  Also, the top
few jobs are almost  never semantically coupled (except by  "focus of
attention"), so  the precise order in which  they are executed rarely
matters.

Perhaps what is needed is not a single priority value for each  task,
but a vector of  numbers. At each cycle, AM would  construct a vector
of its current "interests" and needs, and each task's vector would be
dot-multiplied  against this  global  vector  of  AM's  desires.  The
highest scorer would  then be chosen.  For example,  one dimension of
the  rating  could  be  "safety", and  one  could  be  "best possible
payoff", one could be "average expected payoff", etc.   Sometimes, AM
would  have to break  out of  a stagnant situation,  and it  would be
willing to try riskier  tasks than ususal.  This was not  implemented
because of  the great  increase in cpu  time it  would cause. It  is,
however,  probably a better  design than  the current one.  Even more
intelligent schemes  can be  envisioned --  involving  more and  more
symbolic data being stored with  each task. Ultimately, this would be
just  the English reasons themselves; by  that time, the task-orderer
would have  grown into  an incredibly  complex AI  program itself  (a
natural language program plus an interrelator plus...).

The  agenda  list should  really  be an  agenda  ⊗4tree⊗*, since  the
ordering of  tasks is  really just  partial, not  total. If  this  is
clear,  then  skip the  rest  of  this  paragraph.   There  are  some
"legitimate" orderings of tasks on the agenda; if task X is supported
by a subset of the reasons which support Y, then clearly the priority
of X should be less than or equal to the priority of Y.  Two tasks
of  the form "Fillin examples  of A", "Fill in examples  of B" can be
ordered simply because  A is currently  much more interesting than  B.
But  often, two tasks  will have  no ironclad ordering  between them:
compare "Fillin  examples  of  Sets" and  "Check  generalizations  of
Union".  Thus the ordering is only partial, and it is the artifice of
the  global  evaluation  function which  embeds  this  into a  linear
ordering.  If multiprocessors are  used, it might be advantageous  to
keep the original partial ordering around.

. SSSEC(Choice of Domain)

Here are some questions this subsection will address:

.BN

⊗8# ⊗*  What are  the inherent  limitations --  and advantages --  in
fixing a domain for AM to work in?

⊗8# ⊗*  What characterisitcs are favorable  to automating research in
any given domain?

⊗8# ⊗*  What are  the  specific reasons  for and  against  elementary
finite set theory as the chosen starting domain?

.E

Research in  various domains  of science  and math  proceeds slightly
differently.   For  example, psychology  is interested  in explaining
people, not in creating new  kinds of people. Math is  not interested
in individual entities so much as in new kinds of entities. There are
ethical restrictions on physicians which prevent certain  experiments
from being  done. Political  experiments rarely permit  backtracking,
etc.  Each field has its own peculiarities.

If we  want a system to work in many  domains, we'd have to sacrifice
some power.$$ This  is assuming a  system of a  given fixed size.  If
this restriction  isn't present, then a  reasonable "general-purpose"
system  could  be  built  as  several  systems  linked  by one  giant
switch.$. Within a given field of knowlege (like math),  the finer the
category we limit ourselves  to, the more specific are the heuristics
which become available.   So  it was  reasonable to  make this  first
attempt limited to one narrow domain.

This brings  up  the choice  of domain.  What  should it  be? As  the
DENDRAL  project illustrated so clearly$$  see [Buchanan, Feigenbaum,
et. al.]. Choice of subject was  enabled by J. Lederberg in 1965.  $,
choice  of  subject  domain  is quite  important  when  studying  how
researchers  discover and  develop their  theories.   Mathematics was
chosen as the domain of this investigation, because

.BN

λλ In doing  math research, one  needn't cope with the  uncertainties
and  fallability  of  testing   equipment;  that  is,  there  are  no
uncertainties in the data (compared to, e.g., chemical identification
from mass spectrograms).

λλ Reliance on  experts' introspections is  one of the  most powerful
techniques  for  codifying the  judgmental criteria  necessary  to do
effective work in a field;  I personally have had enough training  in
elementary mathematics  so that I  didn't have to rely  completely on
external  sources for  guidance in formulating  such heuristic rules.
also,  several  excellect  sources  were  available   [Polya,  Skemp,
Hadamard, Kershner, etc.].

λλ A hard science is of  course easier to work in than a soft one; to
automate research in  psychology would require  more knowledge  about
human information  processing than now  is known,  because psychology
deals with entities as complex as you and I. Also, in a hard science,
the ⊗4languages⊗*  to  communicate  information can  be  simple  even
though the ⊗4messages⊗* themselves be sophisticated.

λλ  Since mathematics  can deal  with any  conceivable constructs,  a
researcher there is not limited to explaining observed data.  Related
to this is the freedom to investigate -- or to give up on -- whatever
the researcher  wants to. There is  no single discovery  which is the
"goal", no given problem to solve, no right or wrong behavior.

λλ Unlike "simpler" fields, such as propositional logic, there is  an
abundance of heuristic rules available for the picking.

.E

The limitations of math as a domain  are closely intertwined with its
advantages.   Having no  ties to real-world  data can be  viewed as a
limitation, as can having no  clear goal. There is always the  danger
that  AM will  give up  on each  theory as  soon as  the first  tough
obstacle crops up.

Since  math has been worked  on for millenia by  some of the greatest
minds from  many  different cultures,  it is  unlikely  that a  small
effort  like  AM  would make  any  new  inroads,  have any  startling
insights.  In that respect, Dendral's task was a lot more  promising:
humans had as  yet not studied  all isomers of common  cyclic organic
molecules  in a systematic  way (they are  too numerous).   Of course
math -- even at the elementary level that AM explored it -- still has
undiscovered gems (e.g., the recent unearthing of Conway's numbers).

As  Lederberg   commented$$  Joshua  Lederberg's  review   of  Joseph
Weizenbaum's book,...$, AI can succeed in automating an activity only
when a  "strong theory" of  that activity  exists. AM  is built on  a
detailed  model  of  how  humans  do  math research.    In  the  next
subsection, we'll discuss the model of math research that AM assumes.

Before that,  consider  for a  moment  which other  fields  of  human
endeavor have a good model, and also  enjoy all the advantages listed
above:  Other domains of  math, classical  physics, ..?.. --  not too
many.

. SSSEC(Limitations of the Model of Math Research)

AM, like  anything else in  this world, is  constrained by a  mass of
assumptions. Most of these are "compiled" or interwoven into the very
fabric of AM,  hence can't  be tested by  experiments on AM.   AM  is
built around  a particular  model of  how mathematicians  actually go
about   doing  their   research.     This  model  was   derived  from
introspection, but can  be supported by  quotes from Polya, Kershner,
Hadamard, Skemp, and many others.  No attempt will be made to justify
any of  these  premises.    Here is  a  simplified  summary  of  that
information processing model for math theory formation:

.ONCE NARROW 2,2 SELECT 8 TURN OFF "α" TURN ON "∞→"

⊂∞α→⊃

.MODELPAGE: PAGE;

.MODELSSEC: SSECNUM;

.MODELSSSEC: SSSECNUM;

.MODELSEC: SECNUM;

.BREAK; BN NARROW 2,2 SINGLE SPACE PREFACE 0

λλ The order in which a math textbook presents a theory is almost the
exact opposite  of the order in which  it was actually discovered and
developed.  In a text, new  definitions are stated with little or  no
motivation, and they turn out to be just the ones needed to state the
next big  theorem, whose proof then magically appears. In contrast, a
mathematician  doing   research  will   examine  some   already-known
concepts, perhaps trying to find some regularity in experimental data
involving them. The patterns he  notices are the conjectures he  must
investigate further, and these relationships directly motivate him to
make new definitions.

λλ  Each step  the  researcher takes  while developing  a  new theory
involves choosing from  a large set of  "legal" alternatives --  that
is,  searching.   The  key to  keeping this  from  becoming a  blind,
explosive  search  is the  proper  use of  evaluation  criteria. Each
mathematician uses his own  personal heuristics to choose  the "best"
alternative available.

λλ   Non-formal   criteria   (aesthetic  interestingness,   inductive
inference from  empirical evidence,  analogy, and  utility) are  much
more  important   than   formal  deductive   methods  in   developing
mathematically   worthwhile   theories,   and   in  avoiding   barren
diversions.

λλ Progress in ⊗4any⊗*  field of mathematics demands much  non-formal
heuristic  expertise  in  ⊗4many⊗*  different  "nearby"  mathematical
fields.  So a  broad, universal  core of  knowledge must  be mastered
before any single theory can meaningfully be developed.

λλ It is sufficient (and  pragmatically necessary) to have and  use a
large  set  of  informal  heuristic  rules. These  rules  direct  the
researcher's next activities, depending  on the current situation  he
is  in.   These rules  can  be assumed  to  superimpose ideally:  the
combined  effect of several rules  is just the sum  of the individual
effects.

λλ The  necessary  heuristic rules  are  virtually  the same  in  all
branches of mathematics,  and at all levels of  sophistication.  Each
specialized  field will  have some of  its own  heuristics; those are
normally much more powerful than the general-purpose heuristics.

λλ For true understanding, the researcher should  grasp$$ Have access
to,  relate to,  store,  be able  to  manipulate, be  able to  answer
questions about  $  each  concept  in  several  ways:  declaratively,
abstractly, operationally,  knowing  when it  is relevant,  and as  a
bunch of examples.

λλ  Common metaphysical assumptions about  nature and science: Nature
is  fair,   uniform,   and  regular.   Coincidences   have   meaning.
Statistical  considerations are  valid when  looking at  mathematical
data.  <<Perhaps incorporate some of Iberall's metaphysics here.>

.WIDEN 2,2

.ONCE SELECT 8 TURN OFF "α" TURN ON "∞→"

%∞α→$

.END

. SSSEC(Ultimate powers and weaknesses)

Consider now  ⊗4any⊗* system which  is consistent with  the preceding
model  of math  research, and  whose orientation  is to  discover and
develop new (to the system) mathematical theories.  This  includes AM
itself, but  might also include  a bright high-school senior  who has
been taught a large body of heuristic rules.

What  can such  systems ultimately  achieve? What are  their ultimate
limits?    Answers  to  ultimate  questions  are  hard   to  come  by
experimentally,  so  this  discussion  will be  quite  philosophical,
emotional, and short.  The model  of math research hinges around  the
use of heuristic rules for guidance at all  levels of behavior. It is
questionable  whether  or  not  all  known  mathematics could  evolve
smoothly in this way.   As a first  order fixup, we've mentioned  the
need to  provide good meta-heuristics,  to keep enlarging the  set of
heuristics.   If this is not  enough (if meta-meta-...-heuristics are
needed),  then  the model  is  a  poor  one  and  has  some  inherent
limitations.   If some discoveries can only  be made irrationally, by
random chance, then  any such  system would be  incapable of  finding
those concepts.

Turning aside  from math,  what about  systems whose  design -- as  a
computer  program -- is  similar to AM?$$  Having an  agenda of tasks
with reasons and reason-ratings  combining to form a global  priority
for  each  task,  having  units/modules/frames/Beings/Actors/concepts
which  have parts/slots/facets, etc. Heuristic  rules are tacked onto
relevant concepts,  and are  executed  to produce  new concepts,  new
tasks, new facet entries. $  Building such systems will be "fun", and
perhaps  will   result   in   new  discoveries   in   other   fields.
Eventually, scientists  (at least  in  a few  very  hard domains)  may
relegate  more and more  of their  "hack" research duties  to AM-like
systems.   The  ultimate  limitations  will  be  those  arising  from
incorrect (e.g.,  partial) models  of the  activities the  system must
perform.    The systems  themselves  may help  improve  these models:
experiments that are performed  on the systems are actually  tests of
the underlying model; the results might cause revisions to be made in
the model, then in the system, and the whole cycle would begin again.

.SSEC(Final Conclusions)

<<This "final conclusions" section is not written yet.>

This is the final section in the body of the thesis.

It should summarize the entire project, emphasizing the
overall importance (contribution to knowledge), the 
interesting details worth remembering (agenda with rules, a couple
new bits of knowledge), summarizing judgments, etc.

Perhaps: Segway into Appendix 1 [model of scientific research as heur search].